Project - Aritificial Neural Networks - Part 4 (Classification)

BUSINESS CONTEXT:

A Recognising multi-digit numbers in photographs captured at street level is an important component of modern-day map making. A classic example of a corpus of such street-level photographs is Google’s Street View imagery composed of hundreds of millions of geo-located 360-degree panoramic images.
The ability to automatically transcribe an address number from a geo-located patch of pixels and associate the transcribed number with a known street address helps pinpoint, with a high degree of accuracy, the location of the building it represents. More broadly, recognising numbers in photographs is a problem of interest to the optical character recognition community.
While OCR on constrained domains like document processing is well studied, arbitrary multi-character text recognition in photographs is still highly challenging. This difficulty arises due to the wide variability in the visual appearance of text in the wild on account of a large range of fonts, colours, styles, orientations, and character arrangements.
The recognition problem is further complicated by environmental factors such as lighting, shadows, specularity, and occlusions as well as by image acquisition factors such as resolution, motion, and focus blurs. In this project, we will use the dataset with images centred around a single digit (many of the images do contain some distractors at the sides). Although we are taking a sample of the data which is simpler, it is more complex than MNIST because of the distractors.

PROJECT OBJECTIVE:

We will build a digit classifier on the SVHN (Street View Housing Number) dataset.

STRATEGY:

  1. Use a multi stage strategy to tune hyperparameters of the ANN using an opensource search automation package. We have chosen Optuna for this purpose.
  2. Develop a web app playground to test different combination of values for hyperparameters and visulise the training and testing process using a Liveplot.

(1) Import all Python Libraries

(2) Data loading and verification

4) Creating Helper Classes for Model creation and Live plotting

(4a) Class to create and customize DNNs.

This is a generic helper class to create N layered ANN with a flexibility to place BatchNormalization, Activation and Dropout layers.

(4b) Callback Class for Live plotting during training

This is a helper class to Live Plot the training progress during modelling

(5) Pre Modelling Baseline

Strategy:

  1. USe the 32 X 32 image with a baseline model and measure it's performance.
  2. Crop the image to 22 X 22 image and use that to build a base line model and performance score.
  3. We will use categorical_cross_entropy as the loss function and accuracy as the performance measure throught the excercise.

(5a) Without Cropping the image

Observations:

  1. We have obtained an accuracy of 0.73928 or almost 74% as the baseline score with the 32 X 32 image.

(5b) Cropping the image to 22 X 22

Observations:

  1. The overlapping or digits around the boundry are removed and the actual digit itself is clearly visble.

Observations:

  1. We find that the cropped 22 X 22 image has provided a better performance score compared to 32 X 32 by a margin of almost 4%.
  2. We will use the cropped image for tuning the hyper parameters.

6 Hyperparameter Tuning using Optuna.

https://optuna.readthedocs.io/en/stable/

Hyperparameter tuning strategy:

  1. Architecture Selection : Find the optimal number of layers, number of neurons per layer, optimal combination of Activation, BatchNormalization and Dropout.
  2. Coarse Tuning : Find the optimal weight initialization parameters, activation type and drop rate at each layer.
  3. Fine Tuning : Find the optimal optimizer and their parameters.
  4. Final Tuning : Find the optimal number of epochs and batch size

At each stage the parameter determined in the previous stage flows into the next one to override defaults

Note:
    In the interest of time we have reduced the number of `Trials` or iterations. Increasing the number of `Trials` iterations increases the chances of finding a better set of parameters but does not always guarentee that.

(6.1) Architecture Selection by hyperparameter tuning

Observations:

  1. A optimal combination of 5 hidden layers along with Dropout, BatchNormalization and Activation in that order has yielded an accuracy of almost 85% which is an increase of almost 7%.

(6.2) Coarse tuning by hyper parameter tuning

Observations:

  1. The Coarse tuning has not yielded a better performance infact it's marginally lower by 0.3%.

(6.3) Fine tuning - Hunt for optimizer

(6.4) Final tuning by hyper parameter tuning

(6.5) Testing model with tuned hyper parameters and manual tuning

Observations:

  1. It's interesting to see that curve has stabilised betwen 50 to 100 epochs and that the validation accuracy is consistent with the training accuracy. This means that the model is a good fit.
  2. With the given hyper parameters the validation accuracy has gone beyond 90% and the test accuracy is close enough i.e 89%.

(6.6) Classification Report